找到同一问题的不同解决方案是与创造力和对新颖情况的适应相关的智能的关键方面。在钢筋学习中,一套各种各样的政策对于勘探,转移,层次结构和鲁棒性有用。我们提出了各种各样的连续政策,一种发现在继承人功能空间中多样化的政策的方法,同时确保它们接近最佳。我们将问题形式形式化为受限制的马尔可夫决策过程(CMDP),目标是找到最大化多样性的政策,其特征在于内在的多样性奖励,同时对MDP的外在奖励保持近乎最佳。我们还分析了最近提出的稳健性和歧视奖励的绩效,并发现它们对程序的初始化敏感,并且可以收敛到次优溶液。为了缓解这一点,我们提出了新的明确多样性奖励,该奖励旨在最大限度地减少集合中策略的继承人特征之间的相关性。我们比较深度控制套件中的不同多样性机制,发现我们提出的明确多样性的类型对于发现不同的行为是重要的,例如不同的运动模式。
translated by 谷歌翻译
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
translated by 谷歌翻译
Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.
translated by 谷歌翻译
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
translated by 谷歌翻译
The findable, accessible, interoperable, and reusable (FAIR) data principles have provided a framework for examining, evaluating, and improving how we share data with the aim of facilitating scientific discovery. Efforts have been made to generalize these principles to research software and other digital products. Artificial intelligence (AI) models -- algorithms that have been trained on data rather than explicitly programmed -- are an important target for this because of the ever-increasing pace with which AI is transforming scientific and engineering domains. In this paper, we propose a practical definition of FAIR principles for AI models and create a FAIR AI project template that promotes adherence to these principles. We demonstrate how to implement these principles using a concrete example from experimental high energy physics: a graph neural network for identifying Higgs bosons decaying to bottom quarks. We study the robustness of these FAIR AI models and their portability across hardware architectures and software frameworks, and report new insights on the interpretability of AI predictions by studying the interplay between FAIR datasets and AI models. Enabled by publishing FAIR AI models, these studies pave the way toward reliable and automated AI-driven scientific discovery.
translated by 谷歌翻译
基于模拟的推理的神经后验估计方法可能不适合通过在多个观测值上进行条件来处理后验分布,因为它们可能需要大量的模拟器调用以产生准确的近似值。神经可能性估计方法可以自然处理多个观察结果,但需要单独的推论步骤,这可能会影响其效率和性能。我们引入了一种基于模拟的推理的新方法,该方法享有两种方法的好处。我们建议对单个观察值引起的后验分布进行建模,并引入采样算法,该算法将学习分数结合在一起以有效地从目标中进行样本。
translated by 谷歌翻译
视频中的战斗检测是当今监视系统和流媒体的流行率的新兴深度学习应用程序。以前的工作主要依靠行动识别技术来解决这个问题。在本文中,我们提出了一种简单但有效的方法,该方法从新的角度解决了任务:我们将战斗检测模型设计为动作感知功能提取器和异常得分生成器的组成。另外,考虑到视频收集帧级标签太费力了,我们设计了一个弱监督的两阶段训练计划,在此我们使用在视频级别标签上计算出的多个实体学习损失来培训得分生成器,并采用自我训练的技术以进一步提高其性能。在公开可用的大规模数据集(UBI-Fights)上进行了广泛的实验,证明了我们方法的有效性,并且数据集的性能超过了几种先前的最先进的方法。此外,我们收集了一个新的数据集VFD-2000,该数据集专门研究视频战斗检测,比现有数据集更大,场景更大。我们的方法的实现和拟议的数据集将在https://github.com/hepta-col/videofightdetection上公开获得。
translated by 谷歌翻译
标准化流是构建概率和生成模型的流行方法。但是,由于需要计算雅各布人的计算昂贵决定因素,因此对流量的最大似然训练是具有挑战性的。本文通过引入一种受到两样本测试启发的流动训练的方法来解决这一挑战。我们框架的核心是能源目标,这是适当评分规则的多维扩展,该规则基于随机预测,可以接受有效的估计器,并且超过了一系列可以在我们的框架中得出的替代两样本目标。至关重要的是,能量目标及其替代方案不需要计算决定因素,因此支持不适合最大似然训练的一般流量体系结构(例如,密度连接的网络)。我们从经验上证明,能量流达到竞争性生成建模性能,同时保持快速产生和后部推断。
translated by 谷歌翻译
在考虑混杂变量时估计干预措施的效果是因果推断的关键任务。通常,混杂因素没有观察到,但是我们可以访问大量的非结构化数据(图像,文本),这些数据包含有关缺失混杂因素的有价值的代理信号。本文表明,利用通常被现有算法未使用的非结构化数据提高了因果效应估计的准确性。具体而言,我们引入了深层多模式结构方程,这是一个生成模型,其中混杂因素是潜在变量,非结构化数据是代理变量。该模型支持多个多模式代理(图像,文本)以及缺少数据。我们从经验上证明了基因组学和医疗保健的任务,我们的方法纠正了使用非结构化输入混淆,从而有可能使用以前在因果推理中不使用的大量数据。
translated by 谷歌翻译
3D扫描是一种复杂的多级进程,它产生了由于遮挡,反射,阴影,扫描仪运动,物体表面的特定属性,对象曲线的特定属性,Imperfect重建算法等指向云完成而产生损坏部件的对象的点云。填写对象的缺失部分并获得其高质量的3D表示。现有的完成方法在学术数据集中表现良好,具有预定义的对象类和非常特定的缺陷类型;然而,它们的性能在真实的环境中下降,并在以前看不见的对象类上进一步降低。我们提出了一种在对称物体上表现良好的新颖框架,这些框架在人造环境中普遍存在。与基于学习的方法不同,所提出的框架不需要培训数据,并且能够使用例如在客户3D扫描过程中完成非关键损坏。 kinect,飞行时间或结构化光扫描仪。通过彻底的实验,我们表明拟议的框架在云完成现实世界客户扫描的点云完成时实现了最先进的效率。我们在两种类型的数据集中基准框架性能:正确增强现有的学术数据集和各种对象的实际3D扫描。
translated by 谷歌翻译